Overview

Dataset statistics

Number of variables11
Number of observations16512
Missing cells158
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.4 MiB
Average record size in memory88.0 B

Variable types

Numeric10
Categorical1

Alerts

longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
median_house_value is highly correlated with median_incomeHigh correlation
longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
median_house_value is highly correlated with median_incomeHigh correlation
longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
df_index is highly correlated with longitude and 3 other fieldsHigh correlation
longitude is highly correlated with df_index and 3 other fieldsHigh correlation
latitude is highly correlated with df_index and 3 other fieldsHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
median_house_value is highly correlated with df_index and 4 other fieldsHigh correlation
ocean_proximity is highly correlated with df_index and 3 other fieldsHigh correlation
df_index is uniformly distributed Uniform
df_index has unique values Unique

Reproduction

Analysis started2022-07-10 20:18:28.876997
Analysis finished2022-07-10 20:18:44.361380
Duration15.48 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct16512
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10331.45452
Minimum0
Maximum20638
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size129.1 KiB
2022-07-10T17:18:44.423880image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1027.55
Q15157.75
median10339.5
Q315521.25
95-th percentile19628.45
Maximum20638
Range20638
Interquartile range (IQR)10363.5

Descriptive statistics

Standard deviation5979.056892
Coefficient of variation (CV)0.5787236329
Kurtosis-1.207383237
Mean10331.45452
Median Absolute Deviation (MAD)5182
Skewness-0.003143328072
Sum170592977
Variance35749121.32
MonotonicityNot monotonic
2022-07-10T17:18:44.548881image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
34351
 
< 0.1%
33711
 
< 0.1%
136121
 
< 0.1%
156611
 
< 0.1%
95181
 
< 0.1%
177141
 
< 0.1%
197631
 
< 0.1%
54321
 
< 0.1%
74811
 
< 0.1%
Other values (16502)16502
99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
206381
< 0.1%
206371
< 0.1%
206361
< 0.1%
206351
< 0.1%
206341
< 0.1%
206331
< 0.1%
206321
< 0.1%
206301
< 0.1%
206291
< 0.1%
206281
< 0.1%

longitude
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct825
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-119.5756353
Minimum-124.35
Maximum-114.31
Zeros0
Zeros (%)0.0%
Negative16512
Negative (%)100.0%
Memory size129.1 KiB
2022-07-10T17:18:44.673881image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-124.35
5-th percentile-122.47
Q1-121.8
median-118.51
Q3-118.01
95-th percentile-117.07
Maximum-114.31
Range10.04
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation2.001828066
Coefficient of variation (CV)-0.01674110332
Kurtosis-1.334715391
Mean-119.5756353
Median Absolute Deviation (MAD)1.29
Skewness-0.2935554126
Sum-1974432.89
Variance4.007315604
MonotonicityNot monotonic
2022-07-10T17:18:44.892628image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-118.31133
 
0.8%
-118.3129
 
0.8%
-118.29121
 
0.7%
-118.27113
 
0.7%
-118.35113
 
0.7%
-118.28107
 
0.6%
-118.19107
 
0.6%
-118.14101
 
0.6%
-118.36101
 
0.6%
-118.2699
 
0.6%
Other values (815)15388
93.2%
ValueCountFrequency (%)
-124.351
 
< 0.1%
-124.32
 
< 0.1%
-124.271
 
< 0.1%
-124.261
 
< 0.1%
-124.251
 
< 0.1%
-124.233
< 0.1%
-124.221
 
< 0.1%
-124.213
< 0.1%
-124.194
< 0.1%
-124.185
< 0.1%
ValueCountFrequency (%)
-114.311
< 0.1%
-114.471
< 0.1%
-114.491
< 0.1%
-114.551
< 0.1%
-114.572
< 0.1%
-114.582
< 0.1%
-114.592
< 0.1%
-114.61
< 0.1%
-114.631
< 0.1%
-114.641
< 0.1%

latitude
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct839
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.63931444
Minimum32.54
Maximum41.95
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size129.1 KiB
2022-07-10T17:18:45.017634image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum32.54
5-th percentile32.82
Q133.94
median34.26
Q337.72
95-th percentile38.95
Maximum41.95
Range9.41
Interquartile range (IQR)3.78

Descriptive statistics

Standard deviation2.13796281
Coefficient of variation (CV)0.05998888709
Kurtosis-1.11725371
Mean35.63931444
Median Absolute Deviation (MAD)1.23
Skewness0.4614558848
Sum588476.36
Variance4.570884976
MonotonicityNot monotonic
2022-07-10T17:18:45.142628image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34.06207
 
1.3%
34.05193
 
1.2%
34.07191
 
1.2%
34.08191
 
1.2%
34.02169
 
1.0%
34.09169
 
1.0%
34.04164
 
1.0%
34.1158
 
1.0%
34.03152
 
0.9%
33.97147
 
0.9%
Other values (829)14771
89.5%
ValueCountFrequency (%)
32.541
 
< 0.1%
32.552
 
< 0.1%
32.568
 
< 0.1%
32.5715
0.1%
32.5820
0.1%
32.5910
0.1%
32.69
0.1%
32.6114
0.1%
32.6212
0.1%
32.6315
0.1%
ValueCountFrequency (%)
41.951
 
< 0.1%
41.921
 
< 0.1%
41.881
 
< 0.1%
41.863
< 0.1%
41.841
 
< 0.1%
41.811
 
< 0.1%
41.83
< 0.1%
41.791
 
< 0.1%
41.782
< 0.1%
41.771
 
< 0.1%

housing_median_age
Real number (ℝ≥0)

Distinct52
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.65340359
Minimum1
Maximum52
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size129.1 KiB
2022-07-10T17:18:45.267631image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q118
median29
Q337
95-th percentile52
Maximum52
Range51
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.57481861
Coefficient of variation (CV)0.4388595085
Kurtosis-0.7967859407
Mean28.65340359
Median Absolute Deviation (MAD)10
Skewness0.0594681207
Sum473125
Variance158.1260632
MonotonicityNot monotonic
2022-07-10T17:18:45.393211image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
521027
 
6.2%
36696
 
4.2%
35668
 
4.0%
16626
 
3.8%
17562
 
3.4%
34557
 
3.4%
33504
 
3.1%
26491
 
3.0%
25458
 
2.8%
32449
 
2.7%
Other values (42)10474
63.4%
ValueCountFrequency (%)
13
 
< 0.1%
246
 
0.3%
351
 
0.3%
4150
0.9%
5188
1.1%
6131
0.8%
7139
0.8%
8166
1.0%
9163
1.0%
10212
1.3%
ValueCountFrequency (%)
521027
6.2%
5139
 
0.2%
50106
 
0.6%
49108
 
0.7%
48146
 
0.9%
47152
 
0.9%
46180
 
1.1%
45238
 
1.4%
44268
 
1.6%
43289
 
1.8%

total_rooms
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5494
Distinct (%)33.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2622.539789
Minimum6
Maximum39320
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size129.1 KiB
2022-07-10T17:18:45.517626image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum6
5-th percentile624.55
Q11443
median2119
Q33141
95-th percentile6187.8
Maximum39320
Range39314
Interquartile range (IQR)1698

Descriptive statistics

Standard deviation2138.41708
Coefficient of variation (CV)0.8153992894
Kurtosis31.6770306
Mean2622.539789
Median Absolute Deviation (MAD)799
Skewness4.000835511
Sum43303377
Variance4572827.61
MonotonicityNot monotonic
2022-07-10T17:18:45.642617image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
152716
 
0.1%
158215
 
0.1%
147114
 
0.1%
170513
 
0.1%
146213
 
0.1%
161313
 
0.1%
128312
 
0.1%
170312
 
0.1%
174512
 
0.1%
205312
 
0.1%
Other values (5484)16380
99.2%
ValueCountFrequency (%)
61
< 0.1%
152
< 0.1%
161
< 0.1%
182
< 0.1%
192
< 0.1%
201
< 0.1%
211
< 0.1%
221
< 0.1%
241
< 0.1%
251
< 0.1%
ValueCountFrequency (%)
393201
< 0.1%
379371
< 0.1%
320541
< 0.1%
304501
< 0.1%
304011
< 0.1%
282581
< 0.1%
277001
< 0.1%
251351
< 0.1%
239151
< 0.1%
238661
< 0.1%

total_bedrooms
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1810
Distinct (%)11.1%
Missing158
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean534.9146386
Minimum2
Maximum6210
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size129.1 KiB
2022-07-10T17:18:45.768219image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile137
Q1295
median433
Q3644
95-th percentile1268
Maximum6210
Range6208
Interquartile range (IQR)349

Descriptive statistics

Standard deviation412.6656494
Coefficient of variation (CV)0.7714607521
Kurtosis19.55191916
Mean534.9146386
Median Absolute Deviation (MAD)162
Skewness3.269270493
Sum8747994
Variance170292.9382
MonotonicityNot monotonic
2022-07-10T17:18:45.893276image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
27244
 
0.3%
28044
 
0.3%
39343
 
0.3%
32841
 
0.2%
34541
 
0.2%
33141
 
0.2%
30940
 
0.2%
39439
 
0.2%
29239
 
0.2%
34839
 
0.2%
Other values (1800)15943
96.6%
(Missing)158
 
1.0%
ValueCountFrequency (%)
21
 
< 0.1%
33
< 0.1%
44
< 0.1%
55
< 0.1%
63
< 0.1%
75
< 0.1%
87
< 0.1%
95
< 0.1%
106
< 0.1%
116
< 0.1%
ValueCountFrequency (%)
62101
< 0.1%
54711
< 0.1%
52901
< 0.1%
50331
< 0.1%
49571
< 0.1%
48191
< 0.1%
45851
< 0.1%
44571
< 0.1%
44071
< 0.1%
43861
< 0.1%

population
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3619
Distinct (%)21.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1419.687379
Minimum3
Maximum35682
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size129.1 KiB
2022-07-10T17:18:46.033255image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile347.55
Q1784
median1164
Q31719
95-th percentile3276.9
Maximum35682
Range35679
Interquartile range (IQR)935

Descriptive statistics

Standard deviation1115.663036
Coefficient of variation (CV)0.7858512042
Kurtosis71.79207705
Mean1419.687379
Median Absolute Deviation (MAD)441
Skewness4.741568331
Sum23441878
Variance1244704.01
MonotonicityNot monotonic
2022-07-10T17:18:46.143230image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
122721
 
0.1%
75320
 
0.1%
100520
 
0.1%
89119
 
0.1%
76119
 
0.1%
82519
 
0.1%
105618
 
0.1%
98618
 
0.1%
66218
 
0.1%
101118
 
0.1%
Other values (3609)16322
98.8%
ValueCountFrequency (%)
31
 
< 0.1%
83
< 0.1%
92
< 0.1%
111
 
< 0.1%
132
< 0.1%
143
< 0.1%
152
< 0.1%
172
< 0.1%
181
 
< 0.1%
191
 
< 0.1%
ValueCountFrequency (%)
356821
< 0.1%
163051
< 0.1%
161221
< 0.1%
155071
< 0.1%
150371
< 0.1%
132511
< 0.1%
124271
< 0.1%
122031
< 0.1%
119731
< 0.1%
112721
< 0.1%

households
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1691
Distinct (%)10.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean497.0118096
Minimum2
Maximum5358
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size129.1 KiB
2022-07-10T17:18:46.283809image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile124
Q1279
median408
Q3602
95-th percentile1150
Maximum5358
Range5356
Interquartile range (IQR)323

Descriptive statistics

Standard deviation375.6961563
Coefficient of variation (CV)0.7559099182
Kurtosis19.29465292
Mean497.0118096
Median Absolute Deviation (MAD)150
Skewness3.222081325
Sum8206659
Variance141147.6019
MonotonicityNot monotonic
2022-07-10T17:18:46.486871image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
42947
 
0.3%
33547
 
0.3%
38645
 
0.3%
30645
 
0.3%
31644
 
0.3%
27844
 
0.3%
37543
 
0.3%
29742
 
0.3%
34042
 
0.3%
28442
 
0.3%
Other values (1681)16071
97.3%
ValueCountFrequency (%)
21
 
< 0.1%
33
 
< 0.1%
43
 
< 0.1%
55
< 0.1%
64
< 0.1%
76
< 0.1%
87
< 0.1%
98
< 0.1%
107
< 0.1%
113
 
< 0.1%
ValueCountFrequency (%)
53581
< 0.1%
51891
< 0.1%
50501
< 0.1%
47691
< 0.1%
43391
< 0.1%
42041
< 0.1%
41761
< 0.1%
40721
< 0.1%
40121
< 0.1%
39581
< 0.1%

median_income
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct10905
Distinct (%)66.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.875884278
Minimum0.4999
Maximum15.0001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size129.1 KiB
2022-07-10T17:18:46.611876image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.4999
5-th percentile1.603355
Q12.56695
median3.54155
Q34.745325
95-th percentile7.308945
Maximum15.0001
Range14.5002
Interquartile range (IQR)2.178375

Descriptive statistics

Standard deviation1.904930518
Coefficient of variation (CV)0.4914828157
Kurtosis4.933255636
Mean3.875884278
Median Absolute Deviation (MAD)1.06035
Skewness1.653352864
Sum63998.6012
Variance3.62876028
MonotonicityNot monotonic
2022-07-10T17:18:46.736848image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.87538
 
0.2%
15.000138
 
0.2%
2.62537
 
0.2%
3.12537
 
0.2%
3.37533
 
0.2%
3.62532
 
0.2%
4.12532
 
0.2%
3.87531
 
0.2%
329
 
0.2%
428
 
0.2%
Other values (10895)16177
98.0%
ValueCountFrequency (%)
0.49999
0.1%
0.5367
< 0.1%
0.54951
 
< 0.1%
0.64331
 
< 0.1%
0.67751
 
< 0.1%
0.68251
 
< 0.1%
0.69911
 
< 0.1%
0.70071
 
< 0.1%
0.70251
 
< 0.1%
0.70541
 
< 0.1%
ValueCountFrequency (%)
15.000138
0.2%
152
 
< 0.1%
14.58331
 
< 0.1%
14.42191
 
< 0.1%
14.41131
 
< 0.1%
14.29591
 
< 0.1%
13.9471
 
< 0.1%
13.85561
 
< 0.1%
13.80931
 
< 0.1%
13.68421
 
< 0.1%

median_house_value
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3669
Distinct (%)22.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean207005.3224
Minimum14999
Maximum500001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size129.1 KiB
2022-07-10T17:18:46.861938image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum14999
5-th percentile66400
Q1119800
median179500
Q3263900
95-th percentile494535
Maximum500001
Range485002
Interquartile range (IQR)144100

Descriptive statistics

Standard deviation115701.2973
Coefficient of variation (CV)0.5589290938
Kurtosis0.3342974568
Mean207005.3224
Median Absolute Deviation (MAD)68100
Skewness0.9868126102
Sum3418071883
Variance1.338679019 × 1010
MonotonicityNot monotonic
2022-07-10T17:18:46.986925image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500001786
 
4.8%
137500102
 
0.6%
16250091
 
0.6%
11250080
 
0.5%
18750076
 
0.5%
22500070
 
0.4%
35000065
 
0.4%
8750059
 
0.4%
15000058
 
0.4%
17500052
 
0.3%
Other values (3659)15073
91.3%
ValueCountFrequency (%)
149993
< 0.1%
225003
< 0.1%
266001
 
< 0.1%
269001
 
< 0.1%
275001
 
< 0.1%
283001
 
< 0.1%
300001
 
< 0.1%
325003
< 0.1%
329001
 
< 0.1%
332001
 
< 0.1%
ValueCountFrequency (%)
500001786
4.8%
50000023
 
0.1%
4991001
 
< 0.1%
4990001
 
< 0.1%
4988001
 
< 0.1%
4987001
 
< 0.1%
4986001
 
< 0.1%
4984001
 
< 0.1%
4976001
 
< 0.1%
4974001
 
< 0.1%

ocean_proximity
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size129.1 KiB
<1H OCEAN
7277 
INLAND
5262 
NEAR OCEAN
2124 
NEAR BAY
1847 
ISLAND
 
2

Length

Max length10
Median length9
Mean length8.060380329
Min length6

Characters and Unicode

Total characters133093
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowINLAND
2nd rowNEAR OCEAN
3rd rowINLAND
4th rowNEAR OCEAN
5th row<1H OCEAN

Common Values

ValueCountFrequency (%)
<1H OCEAN7277
44.1%
INLAND5262
31.9%
NEAR OCEAN2124
 
12.9%
NEAR BAY1847
 
11.2%
ISLAND2
 
< 0.1%

Length

2022-07-10T17:18:47.111931image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-10T17:18:47.236938image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
ocean9401
33.9%
1h7277
26.2%
inland5262
19.0%
near3971
14.3%
bay1847
 
6.7%
island2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N23898
18.0%
A20483
15.4%
E13372
10.0%
11248
8.5%
O9401
 
7.1%
C9401
 
7.1%
<7277
 
5.5%
17277
 
5.5%
H7277
 
5.5%
I5264
 
4.0%
Other values (6)18195
13.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter107291
80.6%
Space Separator11248
 
8.5%
Math Symbol7277
 
5.5%
Decimal Number7277
 
5.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N23898
22.3%
A20483
19.1%
E13372
12.5%
O9401
 
8.8%
C9401
 
8.8%
H7277
 
6.8%
I5264
 
4.9%
L5264
 
4.9%
D5264
 
4.9%
R3971
 
3.7%
Other values (3)3696
 
3.4%
Space Separator
ValueCountFrequency (%)
11248
100.0%
Math Symbol
ValueCountFrequency (%)
<7277
100.0%
Decimal Number
ValueCountFrequency (%)
17277
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin107291
80.6%
Common25802
 
19.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
N23898
22.3%
A20483
19.1%
E13372
12.5%
O9401
 
8.8%
C9401
 
8.8%
H7277
 
6.8%
I5264
 
4.9%
L5264
 
4.9%
D5264
 
4.9%
R3971
 
3.7%
Other values (3)3696
 
3.4%
Common
ValueCountFrequency (%)
11248
43.6%
<7277
28.2%
17277
28.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII133093
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N23898
18.0%
A20483
15.4%
E13372
10.0%
11248
8.5%
O9401
 
7.1%
C9401
 
7.1%
<7277
 
5.5%
17277
 
5.5%
H7277
 
5.5%
I5264
 
4.0%
Other values (6)18195
13.7%

Interactions

2022-07-10T17:18:42.752009image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:32.283243image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:33.470741image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:34.595740image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:35.752009image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:36.861381image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:38.095759image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:39.221606image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:40.455134image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:41.548876image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:42.861386image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:32.408240image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:33.580130image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:34.705118image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:35.861394image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:36.970759image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:38.205131image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:39.330117image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:40.564509image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:41.658257image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:42.970756image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:32.517616image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:33.689506image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:34.814490image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:35.970755image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:37.080131image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:38.314512image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:39.439494image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:40.673884image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:41.767634image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:43.080117image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:32.673884image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:33.798873image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:34.908240image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:36.080125image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:37.189489image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:38.423887image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:39.548884image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:40.767635image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:41.877566image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:43.190059image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:32.783248image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:33.908243image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:35.017631image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:36.189489image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:37.315199image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:38.533820image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:39.673869image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:40.877554image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:41.986939image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:43.315057image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:32.908245image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:34.033240image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:35.205116image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:36.298865image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:37.440264image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:38.658259image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:39.798866image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:41.002495image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:42.111870image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:43.424451image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:33.017617image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:34.142618image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:35.314506image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:36.408243image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:37.626992image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:38.767634image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:39.923865image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:41.111915image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:42.221490image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:43.533890image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:33.142618image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:34.267616image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:35.439503image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:36.517616image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:37.752548image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:38.892631image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:40.112018image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:41.221441image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:42.346488image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:43.642618image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:33.251991image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:34.376991image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:35.533259image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:36.627499image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:37.861384image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:39.001990image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:40.221373image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:41.330130image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:42.533259image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:43.767615image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:33.361365image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:34.486370image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:35.642646image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:36.736875image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:37.986384image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:39.111365image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:40.346441image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:41.439508image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-10T17:18:42.642631image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-07-10T17:18:47.330686image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-07-10T17:18:47.502585image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-07-10T17:18:47.658808image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-07-10T17:18:47.815064image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-07-10T17:18:43.924632image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-07-10T17:18:44.142628image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-07-10T17:18:44.267634image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexlongitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
012655-121.4638.5229.03873.0797.02237.0706.02.173672100.0INLAND
115502-117.2333.097.05320.0855.02015.0768.06.3373279600.0NEAR OCEAN
22908-119.0435.3744.01618.0310.0667.0300.02.875082700.0INLAND
314053-117.1332.7524.01877.0519.0898.0483.02.2264112500.0NEAR OCEAN
420496-118.7034.2827.03536.0646.01837.0580.04.4964238300.0<1H OCEAN
51481-122.0437.9628.01207.0252.0724.0252.03.6964165700.0NEAR BAY
618125-122.0337.3323.04221.0671.01782.0641.07.4863412300.0<1H OCEAN
75830-118.3134.2036.01692.0263.0778.0278.05.0865349600.0<1H OCEAN
817989-121.9537.2717.01330.0271.0408.0258.01.7171181300.0<1H OCEAN
94861-118.2834.0229.0515.0229.02690.0217.00.4999500001.0<1H OCEAN

Last rows

df_indexlongitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
1650212396-116.2933.6712.05048.0842.0883.0391.05.6918231300.0INLAND
1650316476-121.2738.1335.02607.0685.02016.0618.01.750082900.0INLAND
165042271-119.8036.7843.02382.0431.0874.0380.03.554296500.0INLAND
165056980-118.0133.9736.01451.0224.0608.0246.06.0648290800.0<1H OCEAN
165065206-118.2833.9341.0936.0257.0913.0226.02.0313122600.0<1H OCEAN
1650715174-117.0733.0314.06665.01231.02026.01001.05.0900268500.0<1H OCEAN
1650812661-121.4238.5115.07901.01422.04769.01418.02.813990400.0INLAND
1650919263-122.7238.4448.0707.0166.0458.0172.03.1797140400.0<1H OCEAN
1651019140-122.7038.3114.03155.0580.01208.0501.04.1964258100.0<1H OCEAN
1651119773-122.1439.9727.01079.0222.0625.0197.03.131962700.0INLAND